Background and Data

COVID-19 is causing havoc in Oregon once again, and as numbers continue to spike, I decided to revisit one the of the first projects I worked on in R Studio. That project can be seen here, and used data from Johns Hopkins. However, because that data is no longer updated this investigation will use data from the NY times that has more current data. The repository for the NY Times data can be found here, and the datasets that are being included are :

  1. us.states : state level data (file description here)

  2. us.counties : county-level data (file description here)

  3. colleges : number of reported cases among students and employees at American colleges and universities, updated May 26th (file description here)

  4. mask_use : survey between July 2 and July 14 (2020) where participants were asked, “How oftern do you wear a mask in public when you expect to be within six feet of another person?” (file description here)

  5. vacc: state level COVID-19 daily vaccination numbers time series data from the Johns Hopkins University repository (file description here)

  6. policytrackerOR: dates and description of policies going into/out of effect in Oregon. To load data for a particular state go to here, find the name of the state file you want to work with.

Here is the data :

us.states <- read_csv('https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-states.csv')
us.counties <- read_csv('https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties.csv')
colleges <- read_csv('https://raw.githubusercontent.com/nytimes/covid-19-data/master/colleges/colleges.csv')
mask_use <- read_csv('https://raw.githubusercontent.com/nytimes/covid-19-data/master/colleges/colleges.csv')
vacc <- read_csv("https://raw.githubusercontent.com/govex/COVID-19/master/data_tables/vaccine_data/us_data/time_series/people_vaccinated_us_timeline.csv")
policytrackerOR <- read_csv("https://raw.githubusercontent.com/govex/COVID-19/govex_data/data_tables/policy_data/table_data/Current/Oregon_policy.csv")

The projected cited above mainly looked at the case, death, and vaccination numbers per state to compare highly and mildly impacted states. In this project I will look at highly impacted states and the counties of Oregon. Additionally this project will use population and density data from the tidycensus package. I discuss more about how I got this data using an API in a blog post here. Note that this data is from 2019, which is a couple years older than the COVID data.

# tidycensus
# State : POP and DENSITY data 
state.pop <- get_estimates(geography = "state", year = 2019, variable =  "POP") %>% rename ("state" = NAME, "population" = value)
state.den <- get_estimates(geography = "state", year = 2019, variable =  "DENSITY") %>% rename ("state" = NAME, "density" = value)
# OREGON : POP and Density data 
or.county.pop <- get_estimates(geography = "county", state = "OR", year = 2019, variable = "POP") %>% rename ("county" = NAME, "population" = value)
or.county.den <- get_estimates(geography = "county", state = "OR", year = 2019, variable = "DENSITY") %>% rename ("county" = NAME, "density" = value)

Wrangling the Data

The COVID data set is already in long form (meaning the dates are in rows instead of columns), and the date is already saved as a date variable. Therefore the main tasks here are to join the us.states data set with the vaccination records, and population estimates. Then join the us.states.vacc with the population data and create new percentage columns.

## # A tibble: 5 × 12
##   date       state      population density cases     case.per deaths perc.deaths
##   <date>     <chr>           <dbl>   <dbl> <dbl>        <dbl>  <dbl>       <dbl>
## 1 2020-01-21 Washington    7614893    115.     1 0.000000131       0           0
## 2 2020-01-22 Washington    7614893    115.     1 0.000000131       0           0
## 3 2020-01-23 Washington    7614893    115.     1 0.000000131       0           0
## 4 2020-01-24 Illinois     12671821    228.     1 0.0000000789      0           0
## 5 2020-01-24 Washington    7614893    115.     1 0.000000131       0           0
## # … with 4 more variables: full.vacc <dbl>, full.vacc.perc <dbl>,
## #   part.vacc <dbl>, part.vacc.perc <dbl>

Note: People_Fully_Vaccinated and People_Partially_Vaccinated show as NA because of the dates displayed were before the vaccine was released.

Next, to join the Oregon county population data with the density data, remove “County, Oregon” from each county object, filter out Oregon from the us.counties data, and then join with the population data per county.

## # A tibble: 10 × 8
##    date       county     population density cases cases.perc deaths deaths.perc
##    <date>     <chr>           <dbl>   <dbl> <dbl>      <dbl>  <dbl>       <dbl>
##  1 2020-02-28 Washington     601592   831.      1 0.00000166      0           0
##  2 2020-02-29 Washington     601592   831.      1 0.00000166      0           0
##  3 2020-03-01 Washington     601592   831.      2 0.00000332      0           0
##  4 2020-03-02 Washington     601592   831.      2 0.00000332      0           0
##  5 2020-03-03 Washington     601592   831.      2 0.00000332      0           0
##  6 2020-03-04 Washington     601592   831.      2 0.00000332      0           0
##  7 2020-03-05 Washington     601592   831.      2 0.00000332      0           0
##  8 2020-03-06 Washington     601592   831.      2 0.00000332      0           0
##  9 2020-03-07 Jackson        220944    79.4     2 0.00000905      0           0
## 10 2020-03-07 Klamath         68238    11.5     1 0.0000147       0           0

Using the data

Looking at States

To begin lets look at the country as a whole, by state. The data will be filtered for 2022-02-02, and then lets look at the top five states with :

Most number of cases

Most Cases
By State as of 2022-02-02
State Total Population Cases Percentage
California 39,512,223 8,484,152 21.47%
Texas 28,995,881 6,280,918 21.66%
Florida 21,477,737 5,572,757 25.95%
New York 19,453,561 4,804,328 24.70%
Illinois 12,671,821 2,943,829 23.23%

Highest percentage of cases

Highest Percent of Cases
By State as of 2022-02-02
State Total Population Cases Percentage
Rhode Island 1,059,361 345,836 32.65%
Alaska 731,545 219,767 30.04%
North Dakota 762,062 226,745 29.75%
Utah 3,205,958 892,141 27.83%
South Carolina 5,148,714 1,394,651 27.09%
Tennessee 6,829,174 1,835,278 26.87%
Kentucky 4,467,673 1,188,934 26.61%
Wisconsin 5,822,434 1,526,321 26.21%
Arkansas 3,017,804 786,010 26.05%
Florida 21,477,737 5,572,757 25.95%

Most number of deaths

Most Deaths
By State as of 2022-02-02
State Total Population Deaths Percentage
California 39,512,223 80,732 0.20%
Texas 28,995,881 80,221 0.28%
Florida 21,477,737 65,273 0.30%
New York 19,453,561 64,646 0.33%
Pennsylvania 12,801,989 41,028 0.32%

Highest percentage of deaths

Highest percent of Deaths
By State as of 2022-02-02
State Total Population Deaths Percentage
Mississippi 2,976,149 11,170 0.38%
Arizona 7,278,717 26,369 0.36%
New Jersey 8,882,190 31,663 0.36%
Alabama 4,903,185 17,215 0.35%
Louisiana 4,648,794 15,781 0.34%

Most people fully vaccinated

Most People Vaccinated
By State as of 2022-02-02
State Total Population People Fully Vaccinated Percentage
California 39,512,223 27,592,347 69.83%
Texas 28,995,881 17,100,467 58.98%
New York 19,453,561 14,445,767 74.26%
Florida 21,477,737 13,995,404 65.16%
Pennsylvania 12,801,989 8,434,737 65.89%

Highest percentage of population fully vaccinated

Highest Percent of People Fully Vaccinated
By State as of 2022-02-02
State Total Population People Fully Vaccinated Percentage
District of Columbia 705,749 614,182 87.03%
Puerto Rico 3,193,694 2,562,862 80.25%
Vermont 623,989 495,818 79.46%
Rhode Island 1,059,361 838,402 79.14%
Maine 1,344,212 1,043,014 77.59%

Looking at Oregon Counties

Next to look at the data a little closer to home, for Oregon Counties. Initially filtering by the most recent date, which as of this being written is 2022-02-02`, looking at a graph of the state as a whole, and then look at the top five Oregon counties.

Most number of cases

Most Cases
Oregon Counties as of 2022-02-02
County Total Population Cases Percentage
Multnomah 812,855 103,905 12.78%
Washington 601,592 77,862 12.94%
Marion 347,818 63,561 18.27%
Clackamas 418,187 55,309 13.23%
Lane 382,067 51,066 13.37%

Highest percentage of Cases

Highest Percentage of Cases
Oregon Counties as of 2022-02-02
County Total Population Cases Percentage
Umatilla 77,950 21,278 27.30%
Jefferson 24,658 6,604 26.78%
Malheur 30,571 7,761 25.39%
Morrow 11,603 2,861 24.66%
Crook 24,404 5,475 22.43%

Most number of Deaths

Most Deaths
Oregon Counties as of 2022-02-02
County Total Population Deaths Percentage
Multnomah 812,855 970 0.12%
Marion 347,818 587 0.17%
Clackamas 418,187 485 0.12%
Washington 601,592 481 0.08%
Jackson 220,944 439 0.20%

Highest percentage of Deaths

Highest Percent of Deaths
Oregon Counties as of 2022-02-02
County Total Population Deaths Percentage
Harney 7,393 36 0.49%
Josephine 87,487 303 0.35%
Malheur 30,571 98 0.32%
Douglas 110,980 335 0.30%
Jefferson 24,658 73 0.30%